# Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Estimating Entropy

- Uncertainty principle. When all k entries of $\overrightarrow{p}$ are equal, $H(\overrightarrow{p})$ should be a monotonic, increasing function of k.
- Consistency under coarse-graining. $H({p}_{1},{p}_{2},{p}_{3})$ is equal to $H({p}_{1},{p}_{2}+{p}_{3})+({p}_{2}+{p}_{3})H({p}_{2},{p}_{3})$.

- 1′.
- Uncertainty principle. When all k entries of $\overrightarrow{n}$ are equal, $\widehat{H}(\overrightarrow{n})$ should be a monotonic, increasing function of k.
- 2′.
- Consistency under coarse-graining. $\widehat{H}({n}_{1},{n}_{2},{n}_{3})$ is equal to $\widehat{H}({n}_{1},{n}_{2}+{n}_{3})+\frac{{n}_{2}+{n}_{3}}{n}\widehat{H}({n}_{2},{n}_{3})$, where n is the total number of observations.
- 3′.
- Asymptotic convergence. As n goes to infinity, $\widehat{H}(\overrightarrow{n})\to H(\overrightarrow{p})$.

## 3. Distances between Distributions

#### 3.1. Kullback–Leibler Divergence

#### 3.2. Jensen–Shannon Divergence

#### 3.3. The Bhattacharyya Bound

**Figure 2.**Prediction error curves and the existence of multiple classes. Solid curve: the Bhattacharyya bound for prediction of trial outcome for the period 1800 to 1820. Triangle symbols and solid line: actual prediction error, when drawing samples (words) from all trials within a class (guilty or not-guilty). As expected, the curve lies strictly below the Bhattacharyya bound. Diamond symbols and dashed line: actual prediction error, when drawing samples from a single trial. The prediction error actually rises (more samples lead to a less accurate prediction), suggesting that the underlying model (trials sample from one of two distributions) is incorrect. We restrict the set of trials here to those with at least one hundred (semantically-associated) words, so as to make the resampling process more accurate.

#### 3.4. Summary

## 4. Correlation, Dependency and Mutual Information

#### 4.1. Mutual Information

**Figure 3.**Predictability of the Kandahar and Helmand time streams. Top: a dramatic asymmetry on short timescales provides strong suggestion of anticipatory, and potentially causal, effects transmitted from Kandahar to Helmand province on rapid (less than two-week) timescales. Bottom: the consistent, opposite asymmetry is seen in the reverse process. A rise in the predictability of Kandahar by Helmand on longer (one-month) timescales, mirrored in the top panel, suggests potentially longer-term seasonal or constraint-based information common to both systems.

#### 4.2. The Data Processing Inequality

## 5. The Bootstrap Estimators In Practice

#### 5.1. The Bayesian Prior Hierarchy

- Draw a random integer, ${k}^{\prime}$, between k and ${k}^{2}$ inclusive.
- Draw a distribution ${p}^{\prime}$ with ${k}^{\prime}$ bins from ${D}_{\mathrm{NSB}}$. Then ${p}^{\prime}$ is approximately uniform in entropy over ${k}^{\prime}$ bins.
- Randomly partition the ${k}^{\prime}$ bins into k bins.
- Coarse grain the distribution ${p}^{\prime}$, given the partition of Step 3, to get p.

`ranksb`and

`rancom`algorithms from [31].

#### 5.2. Coarse-Graining Consistency

**Table 1.**RMS Violations of coarse-graining consistency for entropy (Condition 2) for the Wolpert & Wolf (WW), Nemenman, Shafee & Bialek (NSB), and the bootstrap. The bootstrap estimator leads to a factor of ten or more improvement in coarse-graining consistency; as the amount of data increases, the bootstrap approaches full consistency faster. The average entropy of the three-state distributions is approximately 1.2 bits. These results are for the ${D}^{\prime}$ prior of Section 5.1.

Sampling | WW (RMS bits) | NSB (RMS bits) | Bootstrap (RMS bits) |
---|---|---|---|

1× | 0.2333 | 0.1376 | 0.0046 |

2× | 0.1697 | 0.0865 | 0.0031 |

4× | 0.1165 | 0.0463 | 0.0014 |

8× | 0.0720 | 0.0239 | 0.0008 |

16× | 0.0423 | 0.0121 | 0.0005 |

**Table 2.**RMS Violations of coarse-graining consistency for mutual information (Equation (27)) for the Wolpert & Wolf (WW) estimator, Nemenman, Shafee & Bialek (NSB) for Mutual Information, and the bootstrap. The bootstrap estimator again leads to a factor of ten or more improvement in coarse-graining consistency; as the amount of data increases, the bootstrap approaches full consistency faster. The average mutual information of the $2\times 3$ distribution is approximately 0.25 bits.

Sampling | WW (RMS bits) | NSB-MI (RMS bits) | Bootstrap (RMS bits) |
---|---|---|---|

1× | 0.0316 | 0.0387 | 0.0026 |

2× | 0.0196 | 0.0323 | 0.0012 |

4× | 0.0119 | 0.0208 | 0.0006 |

8× | 0.0069 | 0.0124 | 0.0004 |

16× | 0.0039 | 0.0066 | 0.0003 |

#### 5.3. Bias Correction and the Reliability of Error Estimates

**Figure 4.**Left panel: bias for the 16-state entropy estimation case, under prior ${D}^{\prime}$. Dotted line: naive estimator; Dashed line, *-symbol: Wolpert and Wolf estimator; Dashed line, □-symbol: NSB estimator. Solid line: Bootstrap estimator. Right panel: one-sigma (solid line) and two-sigma (dashed line) error bar reliability; as the sampling factor increases, both rapidly approach their asymptotic values (thin horizontal lines). Average entropy for this prior is 2.4 bits.

**Figure 5.**Left panel: bias for the estimation of mutual information on a $4\times 4$ joint probability distribution, under prior ${D}^{\prime}$. Dotted line: naive estimator; Dashed line, *-symbol: Wolpert and Wolf estimator; Dashed line, □-symbol: NSB estimator. Solid line: Bootstrap estimator. Right panel: one-sigma (solid line) and two-sigma (dashed line) error bar reliability; as the sampling factor increases, both rapidly approach their asymptotic values (thin horizontal lines). Average mutual information under this prior is 0.55 bits.

**Figure 6.**Left panel: bias for the estimation of mutual information on a $16\times 16$ joint probability distribution, under prior ${D}^{\prime}$. Dotted line: naive estimator; Dashed line, *-symbol: Wolpert and Wolf estimator; Dashed line, □-symbol: NSB estimator. Solid line: Bootstrap estimator. Right panel: one-sigma (solid line) and two-sigma (dashed line) error bar reliability; as the sampling factor increases, both rapidly approach their asymptotic values (thin horizontal lines). Average mutual information for this prior is 1.33 bits.

**Table 3.**RMS errors for estimation of the entropy information of a 16-state system, for the Wolpert & Wolf (WW) estimator, Nemenman, Shafee & Bialek (NSB) for Mutual Information, and the bootstrap. The bootstrap estimator has rms errors comparable to the NSB.

Sampling | WW (RMS bits) | NSB (RMS bits) | Bootstrap (RMS bits) |
---|---|---|---|

1× | 1.0442 | 0.3305 | 0.3605 |

2× | 0.8302 | 0.2223 | 0.2304 |

4× | 0.5990 | 0.1537 | 0.1564 |

8× | 0.3854 | 0.1039 | 0.1051 |

16× | 0.2341 | 0.0730 | 0.0734 |

**Table 4.**RMS errors for estimation of the mutual information of a $4\times 4$ joint probability, for the Wolpert & Wolf (WW) estimator, Nemenman, Shafee & Bialek (NSB) for Mutual Information, and the bootstrap. The bootstrap estimator has rms errors comparable to the NSB.

Sampling | WW (RMS bits) | NSB (RMS bits) | Bootstrap (RMS bits) |
---|---|---|---|

1× | 0.3106 | 0.1924 | 0.2783 |

2× | 0.2494 | 0.1446 | 0.1722 |

4× | 0.1851 | 0.1041 | 0.1132 |

8× | 0.1277 | 0.0727 | 0.0761 |

16× | 0.0845 | 0.0523 | 0.0534 |

**Table 5.**RMS errors for estimation of the mutual information of a $16\times 16$ joint probability, for the Wolpert & Wolf (WW) estimator, Nemenman, Shafee & Bialek (NSB) for Mutual Information, and the bootstrap. The bootstrap estimator has rms errors comparable to the NSB.

Sampling | WW (RMS bits) | NSB (RMS bits) | Bootstrap (RMS bits) |
---|---|---|---|

1× | 0.6920 | 0.1028 | 0.1417 |

2× | 0.5267 | 0.0685 | 0.0642 |

4× | 0.3696 | 0.0417 | 0.0373 |

8× | 0.2409 | 0.0260 | 0.0251 |

16× | 0.1487 | 0.0175 | 0.0173 |

#### 5.4. Summary

## 6. Conclusions

## Acknowledgements

## Conflict of Interest

## Appendix

## A. The Trial of John Long, as Reported on 18 September 1820

## B. SIGACT for IED Explosion, 3 April 2005, Kandahar Province

**Title**

**Text**

**Additional Data**

`{:reportkey=>"DCEAC77F-A84D-45F3-88B3-33B9B5A95B20", :type=>:explosivehazard, :category=>:iedexplosion, :trackingnumber=>:2007-033 -005423-0737, :region=>:rcsouth, :attackon=>:enemy, :complexattack=>false, :reportingunit=>:other, :unitname=>:other, :typeofunit=>:noneselected, :friendlywia=>0, :friendlykia=>0, :hostnationwia=>0, :hostnationkia=>1, :civilianwia=>0, :civiliankia=>1, :enemywia=>0, :enemykia=>0, :enemydetained=>0, :mgrs=>"42RTV5110332180", :latitude=>30.99694061, :longitude=>66.39333344, :originatorgroup=>:unknown, :updatedbygroup=>:unknown, :ccir=>:"", :sigact=>:"", :affiliation=>:enemy, :dcolor=>:red, :classification=>:secret, :start=>2005-04-03 03:15:00 UTC, :province=>:kandahar, :district=>:spinboldak, :nearestgeocode=>"AF241131834", :nearestname=>"Spin Boldak"}`

**Source and Post-processing**

## C. Additional Characterizations

**Figure A1.**Estimator bias as a function of the entropy of the underlying distribution for the naive (dotted line), NSB (dashed line) and bootstrap (solid line) estimators. Distributions are over sixteen categories, drawn from ${D}^{\prime}$, and binned in 0.25 bit increments; the bias is for estimates made with sixteen samples (i.e., $1\times $ oversampling). Ranges shown are one-sigma error bars for the bias in the bin. As can be seen, all estimators tend to overestimate small entropies, and underestimate large entropies, with the cross-over point (and overall bias) depending on the method. As in the main text, Section 5.3, we neglect cases where the empirical distribution has entropy zero; this is one source of the positive bias at the lowest entropy bins.

**Figure A2.**The distribution of entropies for distributions sampled from the ${D}_{1}$ (dotted line), ${D}_{\mathrm{NSB}}$ (dashed line) and ${D}^{\prime}$ (solid line) priors. The ${D}_{1}$ prior produces distributions very strongly skewed towards high entropies, while the ${D}_{\mathrm{NSB}}$ distribution is nearly flat for entropies larger than one bit.

## References

- Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423, 623–656. [Google Scholar] [CrossRef] - Paninski, L. Estimation of entropy and mutual information. Neural Comput.
**2003**, 15, 1191–1253. [Google Scholar] [CrossRef] - Collaboration “Institution Formation, Semantic Phenomena, and Three Hundred Years of the British Criminal Court System”, with participants Simon DeDeo, Sara Klingenstein, and Tim Hitchcock.
- This is not exactly true, since we use part-of-speech information for our semantic classifier, and the means (a hidden Markov model) for identifying a word’s lexical category are sensitive to order and context: compare “the dogs bit the sailor” [‘dogs’ as plural noun] and “the sailor dogs the waitress” [‘dogs’ as verb]. It is an interesting and open question the extent to which syntactical features convey semantic information.
- It is possible to correct the magic word problem post hoc, merging categories until no magic words remain. If this rule is specified with sufficient generality, it allows for bootstrap estimation. We do not consider this approach here.
- Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory
**1991**, 37, 145–151. [Google Scholar] [CrossRef] - Nielsen, F. A family of statistical symmetric divergences based on Jensen’s inequality. arXiv
**2010**. [Google Scholar] - Hellman, M.; Raviv, J. Probability of error, equivocation, and the Chernoff bound. IEEE Tran. Inform. Theory
**1970**, 16, 368–372. [Google Scholar] [CrossRef] - Jaynes, E.; Bretthorst, G. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Cover, T.; Thomas, J. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
- Chen, C.H. On information and distance measures, error bounds, and feature selection. Inform. Sci.
**1976**, 10, 159–173. [Google Scholar] [CrossRef] - Ito, T. Approximate error bounds in pattern recognition. Mach. Intell.
**1972**, 7, 369–376. [Google Scholar] - Hashlamoun, W.A.; Varshney, P.K.; Samarasooriya, V.N.S. A tight upper bound on the Bayesian probability of error. IEEE Trans. Inform. Theory
**1994**, 16, 220–224. [Google Scholar] [CrossRef] - We can estimate ρ itself by means of the statistical bootstrap. Resampling suggests that ρ itself may suffer from bias which can be corrected for.
- Consider, for example, the case where the supports of p
_{1}and p_{2}only partially overlap: p_{1}(x) is zero for some x, but p_{2}(x) is non-zero, and vice versa for some y. The draw {x,y} is not possible for either p_{1}or p_{2}, but is possible for the average. - O’Loughlin, J.; Witmer, F.; Linke, A.; Thorwardson, N. Peering into the Fog of War: The Geography of the WikiLeaks Afghanistan War Logs 2004–2009. Eurasian Geogr. Econ.
**2010**, 51, 472–95. [Google Scholar] [CrossRef] - Zammit-Mangion, A.; Dewar, M.; Kadirkamanathan, V.; Sanguinetti, G. Point process modelling of the Afghan War Diary. Proc. Natl. Acad. Sci. USA
**2012**, 109, 12414–12419. [Google Scholar] [CrossRef] [PubMed] - There are more than 78,000 SIGACTs in the original data set. We removed approximately 20,000 because they expressed a level of ambiguity about what happened (“unknown-initiated action” or “suspicious incident”), because their GPS coordinates did not match the description, because they were not time-sensitive (“weapons cache found”) or because they were irrelevant to our study (“IED hoax” or “show of force”). Information on initiative is explicitly included in the SIGACT data and is extracted by the methods of Ref. [16].
- Sanín, F.G.; Giustozzi, A. Networks and Armies: Structuring Rebellion in Colombia and Afghanistan. Stud. Confl. Terror.
**2010**, 33, 836–853. [Google Scholar] [CrossRef] - Gutierrez Sanin, F. Telling the Difference: Guerrillas and Paramilitaries in the Colombian War. Polit. Soc.
**2008**, 36, 3–34. [Google Scholar] [CrossRef] - Green, A.H. Repertoires of Violence Against Non-combatants: The Role of Armed Group Institutions and Ideologies. Ph.D. Thesis, Yale University, New Haven, CT, USA, 2011. [Google Scholar]
- Collaboration “The Emergence of Insurgency in Afghanistan: An Information Theoretic Analysis”, with participants Simon DeDeo and Robert Hawkins.
- Hutter, M. Distribution of Mutual Information. arXiv
**2001**. [Google Scholar] - Zaffalon, M.; Hutter, M. Robust Feature Selection by Mutual Information Distributions. In Proceedings of the 18th Conference in Uncertainty in Artificial Intelligence, Edmonton, Canada, 1–4 August 2002; pp. 577–584.
- Williams, P.L.; Beer, R.D. Generalized Measures of Information Transfer. arXiv
**2011**. [Google Scholar] - Schreiber, T. Measuring Information Transfer. Phys. Rev. Lett.
**2000**, 85, 461–464. [Google Scholar] [CrossRef] [PubMed] - THOTH. Available online: http://thoth-python.org (accessed on 22 May 2013).
- Nemenman, I.; Shafee, F.; Bialek, W. Entropy and Inference, Revisited. In Advances in Neural Information Processing Systems 14; Dietterich, T.G., Becker, S., Ghahramani, Z., Eds.; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
- Nemenman, I.; Bialek, W.; de Ruyter van Steveninck, R. Entropy and information in neural spike trains: Progress on the sampling problem. Phys. Rev. E
**2004**, 69, 56111:1–56111:6. [Google Scholar] [CrossRef] - Wolpert, D.H.; Wolf, D.R. Estimating functions of probability distributions from a finite set of samples. Phys. Rev. E
**1995**, 52, 6841–6854. [Google Scholar] [CrossRef] - Nijenhuis, A.; Wilf, H.S. Combinatorial Algorithms for Computers and Calculators, 2nd ed.; Academic Press: New York, NY, USA, 1978. [Google Scholar]
- Care needs to be taken, however, since a general stochastic function of the underlying process will not, in general, preserve independence in the empirical distribution. It is always possible, for example, to choose an instantiation of a stochastic re-mapping post hoc that magnifies accidental correlations found in the original empirical distribution. We thank John Geanakoplos for pointing this out to us.
- DeDeo, S. Effective theories for circuits and automata. Chaos
**2011**, 21. [Google Scholar] [CrossRef] [PubMed] - Olshausen, B.A.; Field, D.J. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Res.
**1997**, 37, 3311–3325. [Google Scholar] [CrossRef] - Olshausen, B.A.; Field, D.J. Sparse coding of sensory inputs. Curr. Opin. Neurobiol.
**2004**, 14, 481–487. [Google Scholar] [CrossRef] [PubMed] - Daniels, B.C.; Krakauer, D.C.; Flack, J.C. Sparse code of conflict in a primate society. Proc. Natl. Acad. Sci. USA
**2012**, 109, 14259–14264. [Google Scholar] [CrossRef] [PubMed] - Plato. Phaedrus; Harvard University Press: Cambridge, MA, USA, 1925; Plato in Twelve Volumes, Volume 9, Fowler, Harold N., Translator; Available online: http://www.perseus.tufts.edu/hopper/text?doc=Plat.+Phaedrus+265e (accessed on 22 May 2013).
- Afghan War Diary. Available online: http://wikileaks.org/afg/event/2005/04/AFG20050403n68.html (accessed on 4 June 2013).
- Miller, G. Note on the Bias of Information Estimates. In Information Theory in Psychology II-B; Quastler, H., Ed.; Free Press: Glencoe, IL, USA, 1955. [Google Scholar]
- MacKinnon, J.G.; Smith, A.A., Jr. Approximate bias correction in econometrics. J. Econometrics
**1998**, 85, 205–230. [Google Scholar] [CrossRef] - MacKinnon, J.G. Bootstrap Methods in Econometrics. Econ. Rec.
**2006**, 82, S2–S18. [Google Scholar] [CrossRef]

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

DeDeo, S.; Hawkins, R.X.D.; Klingenstein, S.; Hitchcock, T. Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems. *Entropy* **2013**, *15*, 2246-2276.
https://doi.org/10.3390/e15062246

**AMA Style**

DeDeo S, Hawkins RXD, Klingenstein S, Hitchcock T. Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems. *Entropy*. 2013; 15(6):2246-2276.
https://doi.org/10.3390/e15062246

**Chicago/Turabian Style**

DeDeo, Simon, Robert X. D. Hawkins, Sara Klingenstein, and Tim Hitchcock. 2013. "Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems" *Entropy* 15, no. 6: 2246-2276.
https://doi.org/10.3390/e15062246